153 research outputs found
BioHEL: Bioinformatics-oriented Hierarchical Evolutionary Learning
This technical report briefly describes our recent work in the iterative
rule learning approach (IRL) of evolutionary learning/genetics-based machine learning. This approach was initiated by the SIA system.
A more recent example is HIDER. Our approach integrates some of the main characteristics of GAssist, a system belonging to the Pittsburgh approach of Evolutionary Learning, into the general framework of IRL. Our aims in developing this system are use all the good characteristics of GAssist but at the same time overcome some of the scalability limitations that it presents
An efficient decision rule-based system for the protein residue-residue contact prediction
Protein structure prediction remains one of the
most important challenges in molecular biology. Contact maps
have been extensively used as a simplified representation of
protein structures. In this work, we propose a multi-objective
evolutionary approach for contact map prediction. The proposed
method bases the prediction on a set of physico-chemical prop erties and structural features of the amino acids, as well as
evolutionary information in the form of an amino acid position
specific scoring matrix (PSSM). The proposed technique produces
a set of decision rules that identify contacts between amino acids.
Results obtained by our approach are presented and confirm the
validity of our proposal.Junta de Andalucía P07-TIC-02611Ministerio de Educación y Ciencia TIN2011-28956-C02-0
Towards low-carbon conferencing : acceptance of virtual conferencing solutions and other sustainability measures in the ALIFE community
The latest report from the Intergovernmental Panel on Climate Change (IPCC) estimated that humanity has a time window of about 12 years in order to prevent anthropogenic climate change of catastrophic magnitude. Green house gas emission from air travel, which is currently rising, is possibly one of the factors that can be most readily reduced. Within this context, we advocate for the re-design of academic conferences in order to decrease their environmental footprint. Today, virtual technologies hold the promise to substitute many forms of physical interactions and increasingly make their way into conferences to reduce the number of travelling delegates. Here, we present the results of a survey in which we gathered the opinion on this topic of academics worldwide. Results suggest there is ample room for challenging the (dangerous) business-as-usual inertia of scientific lifestyle
The intersection of evolutionary computation and explainable AI.
In the past decade, Explainable Artificial Intelligence (XAI) has attracted a great interest in the research community, motivated by the need for explanations in critical AI applications. Some recent advances in XAI are based on Evolutionary Computation (EC) techniques, such as Genetic Programming. We call this trend EC for XAI. We argue that the full potential of EC methods has not been fully exploited yet in XAI, and call the community for future efforts in this field. Likewise, we find that there is a growing concern in EC regarding the explanation of population-based methods, i.e., their search process and outcomes. While some attempts have been done in this direction (although, in most cases, those are not explicitly put in the context of XAI), we believe that there are still several research opportunities and open research questions that, in principle, may promote a safer and broader adoption of EC in real-world applications. We call this trend XAI within EC. In this position paper, we briefly overview the main results in the two above trends, and suggest that the EC community may play a major role in the achievement of XAI
Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets
Association rule mining is a well-known methodology to discover significant and apparently hidden relations among
attributes in a subspace of instances from datasets. Genetic algorithms have been extensively used to find interesting association
rules. However, the rule-matching task of such techniques usually requires high computational and memory requirements. The use
of efficient computational techniques has become a task of the utmost importance due to the high volume of generated data
nowadays. Hence, this paper aims at improving the scalability of quantitative association rule mining techniques based on
genetic algorithms to handle large-scale datasets without quality loss in the results obtained. For this purpose, a new
representation of the individuals, new genetic operators and a windowing-based learning scheme are proposed to achieve
successfully such challenging task. Specifically, the proposed techniques are integrated into the multi-objective evolutionary
algorithm named QARGA-M to assess their performances. Both the standard version and the enhanced one of QARGA-M have
been tested in several datasets that present different number of attributes and instances. Furthermore, the proposed methodologies
have been integrated into other existing techniques based in genetic algorithms to discover quantitative association rules. The
comparative analysis performed shows significant improvements of QARGA-M and other existing genetic algorithms in terms of
computational costs without losing quality in the results when the proposed techniques are applied.Ministerio de Ciencia y Tecnología TIN2011- 28956-C02-02Junta de Andalucía TIC-7528Junta de Andalucía P12-TIC-1728Universidad Pablo de Olavide APPB81309
Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features
Motivation: The prediction of a protein’s contact map has become
in recent years, a crucial stepping stone for the prediction of the
com-plete 3D structure of a protein. In this article, we describe a
method-ology for this problem that was shown to be successful in
CASP8 and CASP9. The methodology is based on (i) the fusion of the
prediction of a variety of structural aspects of protein residues, (ii)
an ensemble strategy used to facilitate the training process and
(iii) a rule-based machine learning system from which we can
extract human-readable explanations of the predictor and derive
useful information about the contact map representation.
Results: The main part of the evaluation is the comparison against
the sequence-based contact prediction methods from CASP9,
where our method presented the best rank in five out of the six
evaluated met-rics. We also assess the impact of the size of the
ensemble used in our predictor to show the trade-off between
performance and training time of our method. Finally, we also study
the rule sets generated by our machine learning system. From this
analysis, we are able to estimate the contribution of the attributes in
our representation and how these interact to derive contact
prediction
Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology
Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes
- …